Probably Approximately Correct Search

نویسندگان

  • Ingemar J. Cox
  • Ruoxun Fu
  • Lars Kai Hansen
چکیده

We consider the problem of searching a document collection using a set of independent computers. That is, the computers do not cooperate with one another either (i) to acquire their local index of documents or (ii) during the retrieval of a document. During the acquisition phase, each computer is assumed to randomly sample a subset of the entire collection. During retrieval, the query is issued to a random subset of computers, each of which returns its results to the query-issuer, who consolidates the results. We examine how the number of computers, and the fraction of the collection that each computer indexes, affects performance in comparison to a traditional deterministic configuration. We provide analytic formulae that, given the number of computers and the fraction of the collection each computer indexes, provide the probability of an approximately correct search, where a “correct search” is defined to be the result of a deterministic search on the entire collection. We show that the randomized distributed search algorithm can have acceptable performance under a range of parameters settings. Simulation results confirm our analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Search-Aware Conditions for Probably Approximately Correct Heuristic Search

The notion of finding a solution that is approximately optimal with high probability was recently introduced to the field of heuristic search, formalized as Probably Approximately Correct Heuristic Search, or PAC search in short. A big challenge when constructing a PAC search algorithm is to identify when a given solution achieves the desired sub-optimality with the required confidence, allowin...

متن کامل

Improving Query Correctness Using Centralized Probably Approximately Correct (PAC) Search

A non-deterministic architecture for information retrieval, known as probably approximately correct (PAC) search, has recently been proposed. However, for equivalent storage and computational resources, the performance of PAC is only 63% of a deterministic system. We propose a modification to the PAC architecture, introducing a centralized query coordination node. To respond to a query, random ...

متن کامل

Probably Approximately Correct Heuristic Search

A* is a best-first search algorithm that returns an optimal solution. w-admissible algorithms guarantee that the returned solution is no larger than w times the optimal solution. In this paper we introduce a generalization of the w-admissibility concept that we call PAC search, which is inspired by the PAC learning framework in Machine Learning. The task of a PAC search algorithm is to find a s...

متن کامل

On the Feasibility of Unstructured Peer-to-Peer Information Retrieval

We consider the feasibility of web-scale search in an unstructured peer-to-peer network. Since the network is unstructured, any such search is probabilistic in nature. We therefore adopt a probably approximately correct (PAC) search framework. The accuracy of such a search is defined by the overlap between the set of documents retrieved by a PAC search and the set of documents retrieved by an e...

متن کامل

On the Usability of Probably Approximately Correct Implication Bases

We revisit the notion of probably approximately correct implication bases from the literature and present a first formulation in the language of formal concept analysis, with the goal to investigate whether such bases represent a suitable substitute for exact implication bases in practical use-cases. To this end, we quantitatively examine the behavior of probably approximately correct implicati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009